Goto

Collaborating Authors

 spatial reasoning





Supplementary Material for " Diversifying Spatial-Temporal Perception for Video Domain Generalization " Kun-Y u Lin

Neural Information Processing Systems

Hard Norm Alignment loss (HNA): apply the HNA loss (Eq. HMDB, which demonstrates the effectiveness of our model. First, we drop feature from a specific spatial group. Method UCF HMDB STDN-T -1 59.2 STDN-T -2 58.1 STDN-T -3 59.4 STDN-T -4 58.9 Full STDN 60.2 Second, we drop feature from a space scale. In our main manuscript, we conduct all experiments based on ResNet-50.





Topological Spatial Graph Coarsening

Calissano, Anna, Lasalle, Etienne

arXiv.org Machine Learning

Spatial graphs are particular graphs for which the nodes are localized in space (e.g., public transport network, molecules, branching biological structures). In this work, we consider the problem of spatial graph reduction, that aims to find a smaller spatial graph (i.e., with less nodes) with the same overall structure as the initial one. In this context, performing the graph reduction while preserving the main topological features of the initial graph is particularly relevant, due to the additional spatial information. Thus, we propose a topological spatial graph coarsening approach based on a new framework that finds a trade-off between the graph reduction and the preservation of the topological characteristics. The coarsening is realized by collapsing short edges. In order to capture the topological information required to calibrate the reduction level, we adapt the construction of classical topological descriptors made for point clouds (the so-called persistent diagrams) to spatial graphs. This construction relies on the introduction of a new filtration called triangle-aware graph filtration. Our coarsening approach is parameter-free and we prove that it is equivariant under rotations, translations and scaling of the initial spatial graph. We evaluate the performances of our method on synthetic and real spatial graphs, and show that it significantly reduces the graph sizes while preserving the relevant topological information.



Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

Neural Information Processing Systems

Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks. However, their abilities in spatial reasoning, a crucial aspect of human cognition, remain relatively unexplored. Human possess a remarkable ability to create mental images of unseen objects and actions through a process known as the Mind's Eye, enabling the imagination of the unseen world. Inspired by this cognitive capacity, we propose Visualization-of-Thought (VoT) prompting. VoT aims to elicit spatial reasoning of LLMs by visualizing their reasoning traces, thereby guiding subsequent reasoning steps.